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Abstract — In this paper, we consider tlie problem of power 
efficient uplink scheduling in a Time Division Multiple Access 
(TDM A) system over a fading wireless channel. The objective is to 
minimize the power expenditure of each user subject to satisfying 
individual user delay. We make the practical assumption that the 
system statistics are unknown, i.e., the probability distributions of 
the user arrivals and channel states are unknown. The problem 
has the structure of a Constrained Markov Decision Problem 
(CMDP). Determining an optimal policy under for the CMDP 
faces the problems of state space explosion and unknown system 
statistics. To tackle the problem of state space explosion, we 
suggest determining the transmission rate of a particular user in 
each slot based on its channel condition and buffer occupancy 
I only. The rate allocation algorithm for a particular user is a 
learning algorithm that learns about the buffer occupancy and 
channel states of that user during system execution and thus 
addresses the issue of unknown system statistics. Once the rate 
of each user is determined, the proposed algorithm schedules 
I the user with the best rate. Our simulations within an IEEE 
802.16 system demonstrate that the algorithm is indeed able 
to satisfy the user specified delay constraints. We compare the 
performance of our algorithm with the well known M-LWDF 
algorithm. Moreover, we demonstrate that the power expended 
by the users under our algorithm is quite low. 

Index Terms — Multi-user Fading Channel, Markov Decision 
Process, Energy Efficient Scheduling 
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I. Introduction 



Broadband wireless networks like IEEE 802.16 [1] and 3G 
^ cellular [2] are expected to provide Quality of Service (QoS) 
^> for emerging multimedia applications. One of the challenges 
5_i in providing QoS is the time varying nature of the wireless 
L 5^ channel due to multipath fading [3]. Moreover, for portable 
and hand-held devices, energy efficiency is also an important 
consideration. 

In a multi-user wireless system, recent studies [4], [5] 
suggest that since the wireless channel fades independently 
across different users, this diversity can be exploited by op- 
portunistically scheduling the user with the best channel gain. 
This leads to significant performance gain in terms of total 
system throughput. Such scheduling algorithms that exploit the 
characteristics of the physical channel to satisfy some network 
level QoS performance metrics are refererred to as cross layer 
scheduling algorithms [6]. Power required to transmit reliably 
at a certain rate under better channel conditions is much less 
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than that required under poorer channel conditions at the same 
rate [3]. This suggests that in order to save power, one should 
transmit at higher rates under better channel conditions, this 
leads to queuing delays. Moreover, since transmission power is 
an increasing and strictly convex function of the transmission 
rate [3], power efficiency can also be achieved by transmitting 
the data at lower rates, albeit at the cost of increased queuing 
delay thus leading to a power-delay tradeoff. 

In this paper, we consider a single cell multi-user wireless 
uplink system with Time Division Multiple Access (TDMA). 
For such a system, we consider the problem of determining the 
user to be scheduled in each time slot so that the average trans- 
mission power expended by each user is minimized subject to 
a constraint on the average queuing delay experienced by each 
user. Moreover, we assume a peak power constraint, i.e., in 
each slot the transmission power of a user is less than or equal 
to a certain maximum value. This scenario may correspond to 
a base station scheduling users on an uplink in an IEEE 802.16 
system to satisfy delay constraint of each user. 

There is a copious literature on cross layer scheduling 
algorithms. See [7] for a succinct review. The scheduUng 
problem is typically formulated as an optimization problem 
with an objective of efficiently allocating resources such as 
time, frequency bands, power, codes etc. to the users under 
physical layer (wireless channel) and/or network layer QoS 
constraints. Various QoS constraints have been considered in 
the literature like system throughput, minimum rate, maximum 
delay, delay bound, queue stability and fairness. A scheduling 
policy is an allocation rule that allocates these resources 
based on parameters like channel conditions of the users, their 
queue lengths etc. In this paper, we concentrate on efficiently 
allocating power and rates to users based on their channel 
condition and queue length. The power allocation policy is 
considered feasible if it satisfies certain average or peak power 
constraints. On the other hand, the rate allocation policy is 
considered feasible if the physical layer can deliver the data 
reliably to the users at a given rate. The set of all feasible rate 
tuples is called the feasible capacity region [7]. 

A scheduling policy is considered stable if the expected 
queue lengths are bounded under the policy. Many scheduling 
policies proposed in the literature have considered stability as 
a QoS criterion. In [4], the authors determine the throughput 
capacity region of a multi-access system, i.e., the set of all 
feasible rates with average power constraints. In [8], the 
authors have shown that the throughput capacity region is 
same as the multi-access stability region (i.e., the set of all 
arrival vectors for which there exists some rate and power 
allocation policies that keep the system stable.). A scheduler is 
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termed thoughput- optimal if it can maintain the stability of the 
system as long as the arrival rate is within the stabihty region. 
Thoughput optimal scheduUng policies have been explored in 
[4], [9|. Longest Connected Queue (LCQ) [10] , Exponential 
(EXP) 1 11 1 , Longest Weighted Queue Highest Possible Rate 
(LWQHPR) L12J and Modified Longest Weighted Delay First 
(M-LWDF) [13] are other well known throughput optimal 
scheduUng policies. In [14], the authors define the notion 
of delay limited capacity of a multi-access system, i.e., the 
maximum rate achievable such that the delay is independent 
of the fading characteristics. 

While thoughput-optimal scheduling policies maintain the 
stability of the queueing system, they do not necessarily 
guarantee small queue lengths and consequently lower delays. 
Delay-optimal scheduling deals with optimal rate and power 
allocation such that the average queue length and hence aver- 
age delay are minimized for arrival rates within the stabihty 
region under average and peak power constraints. Due to the 
nature of the constraints, there is no loss of optimality in 
choosing the rate and power allocation policies separately [7]. 
Hence to simplify the problem, one can choose any stationary 
power allocation policy that satisfies the peak and average 
power constraints. The delay optimal policy therefore deals 
with with optimal rate allocation for minimizing delays under 
a given power allocation policy. It has been shown that the 
Longest Queue Highest Possible Rate (LQHPR) policy [15] 
(besides being thoughput optimal) is also delay optimal for 
any symmetric power control under symmetric fading provided 
that the packet arrival process is Poisson and packet length is 
exponentially distributed. 

Apart from throughput and delay optimal policies, oppor- 
tunistic scheduling with various fairness constraints have been 
explored in [16], [17]. 

In this paper, our focus is on rate allocation with a constraint 
on peak power as well as average queueing delay which acts 
as the QoS metric. This problem for a single user wireless 
channel without the peak power constraints has been explored 
in the pioneering work of [18]. The problem with many 
generalizations on arrival and channel gain processes have 
been considered in subsequent papers [19], [20], [21], [22], 
[23], [24], [25]. In most of these papers, the scheduhng 
poUcy has been formulated as a control policy within the 
Markov Decision Process (MDP) framework. However, only 
structural results of the optimal policy are available under 
various assumptions and that too for a single user scenario 
only. There is very little work for extending the vast body of 
hterature on delay constrained power efficient scheduling to 
multi-user scenario. Recently, in [26], the author has extended 
the asymptotic analysis of Berry-Gallger [18] for exploiting 
the power-delay tradeoff in multi-user system. The objective 
is to minimize the total power on the downlink subject to 
user queue stability constraints. The author using the concept 
of Lyapunov Drift Steering has also given an algorithm that 
comes within a logarithmic factor of achieving the Berry- 
Gallager power-delay bound. However, on the downlink, the 
base station typically transmits with a constant maximum 
power sufficient to reach the farthest user and hence power 
minimization in not a major concern. Moreover, mirumizing 



the sum power can lead to unfairness, i.e., users with better 
average channel conditions might get a far higher share of the 
bandwidth than the users who have relatively poor average 
channel conditions. On the other hand, for the uplink, the 
problem is to minimize the power of each user subject to 
individual delay constraint, which has not been addressed in 
the literature so far. 

In [27], the author has extended the analysis for single 
user case to the multi-user case, albeit with only two users 
which can be applicable for the uplink also. Beyond two users, 
the problem becomes unwieldy to gain any useful insight, 
primarily due to large state space. For the two user case, 
the author has given an elegant near optimal policy where 
each user's rate allocation is determined by the joint channel 
states across users and the user's own queue state. Thus 
each user's queue evolution process behaves as if it were 
controlled by a single user pohcy. However, computation of 
user's transmission power still takes into account the joint 
channel and queue state processes. 

Even for the single user case [19], [20], [21], [22], [23], 
[24], [25], practical implementation of optimal pohcy is far 
from simple. This is because a knowledge of the probability 
distributions of the arrival and channel gain process is required 
for computing the optimal policy. This knowledge is not 
available in practice. While, we have addressed this limita- 
tion by formulating an on-line algorithm within stochastic 
approximation framework in [28], this algorithm still deals 
with the single user scenario. This algorithm does not assume 
any explicit knowledge of the probability distributions of the 
channel gain and arrival processes. In this paper, we consider 
a multi-user wireless system. The state of the system is 
defined as the minimum information required by the scheduler 
for making scheduling decisions. For the multi-user scenario 
considered in this paper the state space is considerably large 
as compared to that for the single user scenario. We illustrate 
this with a simple example. Let us assume that the channel 
condition of a user can be represented using 8 states. This 
is a practical assumption and has been justified in [29]. Let 
us assume that each user has a buffer in which at most 50 
packets can be stored. For a single user system, the channel 
state and buffer occupancy of the user forms the state of the 
system in any time slot. The number of states is 8 x 50 = 400. 
Now consider a multi-user system with 4 users. In this case, 
the state of the system consists of the channel state and buffer 
occupancy of each user. The state space consists of 50^ x 8* = 
2.56 X 10^" states. Furthermore, the number of states increases 
exponentially with the users. Hence determining the optimal 
pohcy by estimating the dynamic programming value function 
would take prohibitively long time. Hence in this paper, we 
propose a alternate approach. 

In our approach, each user's queue evolution behaves as if 
it were controlled by a single user policy. Depending on each 
user's channel state and queue size, the algorithm allocates 
a certain rate to each user in a slot using the single user 
algorithm outlined in this paper The algorithm then schedules 
the user with the highest rate in a slot. From the structural 
properties of optimal policy for a single user scenario, it is 
well known that the optimal policy is increasing in queue 



length and channel gain [27]. Thus more number of packets are 
transmitted when the queue length is greater or the channel 
gain is higher. Hence a user transmitting at a high rate has 
either very good channel condition, or large queue length. 
Scheduling such a user, therefore, either leads to its power 
savings or aids in satisfying the delay constraint. 

The scheduling algorithms proposed in the literature like 
EXP scheduler [11], LQHPR scheduler [15], M-LWDF [13] 
scheduler require the queue length information for deter- 
mining the scheduling decision. In the downlink scenario, 
this information is readily available to the scheduler residing 
at the base station. However, in the uplink scenario, this 
information needs to be communicated by the users to the 
scheduler. Communicating the queue length information poses 
a significant overhead. In our approach, each user determines 
the rate at which it would transmit if it were scheduled in 
a slot. All the users inform these rates to the base station. 
The base station then schedules the user with the highest rate. 
Thus by communicating the rates directly, we avoid the queue 
length communication overhead. 

The IEEE 802.16 system is an emerging system for broad- 
band wireless access and is expected to provide QoS to the 
users. Through our simulations in an IEEE 802.16 system, 
we demonstrate that the algorithm is indeed able to satisfy 
the delay constraints of the users. Moreover, we demonstrate 
that the power expenditure of a user is commensurate with its 
delay requirement, the average arrival rate and average channel 
conditions. The higher the delay, lower the average arrival rate 
and better the average channel conditions, the lower is the 
power expenditure. 

The contributions of this paper are summarized as follows: 

1) We formulate the problem of minimizing the average 
power expended by each user subject to a constraint 
on individual user delay as a constrained optimization 
problem. To the best of our knowledge, this problem has 
not been studied in a multi-user uplink scenario. 

2) We propose an online algorithm that does not require the 
knowledge of the probability distributions of the channel 
states and the arrivals of the users. 

3) The computational complexity of our approach increases 
only hnearly with the number of users. 

4) The communication overhead of our approach is low 
and hence the algorithm is suitable for practical imple- 
mentation. The algorithm satisfies the delay constraints 
of the users. We demonstrate the power efficiency our 
our algorithm through comparison with the M-LWDF 
algorithm within an IEEE 802.16 system simulation. 

The rest of the paper is organized as follows. In Section II, 
we present the system model. We formulate the problem as an 
optimization problem in Section III, where we show that the 
problem has the structure of a Constrained Markov Decision 
Problem (CMDP). We discuss the issues like large state space 
and unknown system model in determining an optimal solution 
using the traditional CMDP solution techniques. In Section 
IV, we consider and extension to the traditional single user 
scenario based on transmitter induced errors. In Section V, 
we propose an online algorithm that is based on the extension 
to the single user scenario detailed in Section IV. We also 
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Fig. 1 . System Model 

discuss the implementation issues. We present the simulation 
setup and discuss results in Section VI. Finally, we conclude 
in Section VII. 

II. System Model 

As illustrated in Figure 1, we consider uplink transmissions 
in a TDMA system with N users, i.e., time is divided into slots 
of equal duration and only one user is allowed to transmit in a 
slot. We assume that the slot duration is normalized to unity. 
The base station is a centralized entity that schedules the users 
in every slot. We assume a fading wireless channel where the 
channel gain is assumed to remain constant for the duration 
of the slot and to change in an independent and identically 
distributed (i.i.d.) manner across slots. This model is called 
as the block fading model [18]. We assume that the fading 
across users is also i.i.d. Under these assumptions, if a user 
i transmits a signal yl^ in slot n, then the received signal 
can be expressed as, 

Y^ = Hiyi + Gn, (1) 

where denotes the complex channel gain due to fading and 
Gn denotes the complex additive white Gaussian noise with 
zero mean and variance Nq. Let = |-ff^P be the channel 
state for user i in slot n. Practically is a continuous random 
variable and hence so is X^. However, in this paper we assume 
that X^ takes only finite and discrete values from a set X. This 
assumption has been justified in [18], [19]. In this paper, we 
assume that the distribution of and hence that of X^ is 
unknown. 

Each user possesses a finite buffer of B bits. Bits arrive 
into the user buffer and are queued until they are transmitted. 
The arrival process for each user is assumed to be i.i.d. across 
slots. Let denote the number of bits arriving into the user 
i buffer in slot n. We assume that the random variable 
takes values from a finite and discrete set A = {0, ... , A}. 
Like X^, we assume that the distribution of A!^ is unknown. 
Let Ql^ denote the queue length or buffer occupancy of user 
i in slot n. Let denote the number of bits transmitted by 
user i in slot n. We assume that takes values from the set 
U = {0, . . . , B}. Let be an indicator variable that is set to 
1 if user i is scheduled in slot n and is set to otherwise. Let 
I„ be the vector [7^, . . . , 7^]. Note that since only one user 
can transmit in a slot, only one element of I„ is equal to 1 and 
the rest are 0. Let I be the set of all possible N dimensional 
vectors with one element equal to 1 and the rest being 0. Let 
denote the number of bits that the user i transmits in a slot 



4 



if it is scheduled. Then can be represented as = ^n^n- 
Moreover, since a user can at most transmit all the bits in its 
buffer in a slot, K^^ < Q\. Since we assume that the slot length 
is normalized to unity, [/^ is the rate at which user i transmits 
in slot n. Let U„ be the vector [J/^, . . . , [/^], U„ e 

The buffer evolution equation for user i can be expressed 
as, 

Qi+i = rna^{Qi-K,0}+Ai^,. (2) 

The buffer size B is large as compared to the arrival rate, 
thus we can neglect the buffer overflow in the buffer evolution 
equation. 

From [18], the power required for error- free or reliable 
cormnunication at a rate K^^ = u bits/sec when XJ^ = a; is 
given by, 

Pix,u) = ^^i2w - I). (3) 

X 

where W is the bandwidth in Hz. Note that for a given x, 
the transmission power P(.t, u) is an increasing and strictly 
convex function of u. Let P denote the peak power con- 
straint. Let Kl^ be the maximum rate at which user i can 
transmit in a slot n when the channel condition is while 
satisfying the peak power constraint (i.e., P{K^,X^) < P). 
Then the set of feasible rates for user i in slot n, .F^ = 
{0,...,mm{klQi)}. 

We assume that the users specify their QoS requirements in 
terms of the average packet delay requirements. These delay 
requirements of the users are known a priori to the scheduler. 
By Little's law [30], the average delay D is related to the 
average queue length Q as, 

Q = aD, (4) 

where a is the average arrival rate. In the rest of the paper, we 
treat average delay as synonymous with average queue length 
and ignore the proportionality constant a. 

111. Problem Formulation 

In this section, we formulate the problem as a constrained 
optimization problem within the Constrained Markov Decision 
Process (CMDP) framework. 

A. Formulation as a Constrained Optimization Problem 

The user devices have limited battery power, hence it is 
essential to design transmission pohcies that conserve battery 
power. The power-delay tradeoff can be exploited to save 
power at the expense of extra delay. Moreover, multi-user di- 
versity can be exploited to schedule a user with better channel 
state. Such a user requires lesser power while transmitting at 
a certain rate as compared to when it has a poorer channel 
state. However, this also incurs additional delay. The objective 
is to design a joint rate allocation and scheduling scheme that 
minimizes the power expenditure of each user subject to the 
satisfaction of the individual delay constraints. The average 
power consumed by a user i over a long period of time can 
be expressed as. 




The average queue length of a user i over a long period of 
time can be expressed as. 




Each user i wants its average queue length to remain below a 
certain value, say, 5*. Hence the problem becomes. 

Minimize subject to < 5', i = l,...,N. (7) 

Remark 1: Dependence between problems: Note that the 
N problems formulated in (7) are not independent. This is 
because in a TDMA system, only one user can be scheduled in 
a slot. Consequently, the scheduling decision in a slot impacts 
the buffer occupancy of all the users in the future slots. 



B. Notion of Optimal Solution 

The problem in (7) is a multi-objective optimization prob- 
lem with N objectives and A'' constraints. There can be 
multiple average power vectors that can be considered as 
optimal. Hence it is necessary to precisely define the prop- 
erties of an optimal solution sought by us. We seek Pareto 
optimal solutions [31]. Let vector [P^, . . . , P^] be the power 
expenditure vector under the rate allocation policy ip. We say 
that the rate allocation policy is Pareto optimal if and only if 
there exists no rate allocation policy C with the corresponding 
power expenditure vector [P^ , . . . , P^] having the following 
properties, 

Vze{i,...,7V}P^'<p; A 3ie{i,...,7V}lP^^<p;. (8) 

The Pareto optimal solution is generally not unique and the set 
of Pareto optimal solutions is called the set of non-dominated 
solutions. The weighted sum approach is a common approach 
for solving a multi-objective optimization problem [31]. In 
this approach, one aggregates the objective functions into a 
single objective function. The resultant problem has a single 
objective function and N constraints and can be expressed as. 

Minimize P = 7^pi + . . . -|- i^P^, 
subject to, 
< 5\ i = l,...,7V. (9) 

where 7 = [7^, . . . ,7^] is the weight vector It is generally 
assumed that 7' G [0, 1], V«, Yld^i 7* = 1 implying that P is 
a convex combination of the individual powers. In general, the 
non-dominated set (i.e., the set of all Pareto optimal policies) 
may be a non-convex set. By varying the weight vector in the 
weighted sum approach we can determine the Pareto optimal 
policies within a convex subset of the non-dominated set. 
However, choosing the weight vector in order to obtain a 
particular solution is not straight forward. In the next section, 
we formulate the problem in (9) within the CMDP framework. 
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C. The CMDP Framework 

Let X„ ^ [XI..., X^!] and Q„ ^ [Ql...,Q^]. The 
state of the above system S„ at time n can be described by 
the tuple, S„ = [Q„, X„], comprising of the queue length and 
the channel state of each of the A'^ users. Note that the system 
state space S = x is discrete and finite. Let {S„} 
denote the state process. In each slot, the scheduler sets the rate 
vector vector U„ = [C/^, . . . , C/^], where = liK^. U„ 
takes values from the finite action space . {U„} denotes 
the control process. This problem has the structure of a CMDP 
with finite state and action spaces. Since we are considering 
average power expended and average delay suffered, it is an 
average cost CMDP. The scheduler objective is to determine 
an optimal rate allocation policy, i.e., a mapping from past 
history of states and actions to a rate allocation vector U„ 
in every slot n. For a CMDP with finite state and action 
spaces, it is well known that an optimal stationary randomized 
policy exists [32], i.e., the rate allocation policy is a mapping 
from the current system state to a probability distribution on 
the set of rate allocation vectors. However, the traditional 
computational approaches based on Linear Programming [32] 
cannot be used to determine the optimal policy because of the 
following reasons: 

1) Large state space: The system state space is large even 
for very few users. We have already illustrated this 
with an example in Section L Moreover, the state space 
grows exponentially with number of users, hence the 
computational complexity of the traditional approaches 
also grows exponentially with number of users. 

2) Unknown user/system statistics: The probability distri- 
butions of Xl^ and are unknown. The traditional 
approaches rely on the knowledge of these distributions 
for determining the optimal policy. 

We now extend the approach suggested in [28] to determine 
an optimal policy for the problem in (9). 

Let user i be scheduled in slot n. Then the state of the 
system immediately after user i transmits the data can be 
represented as S„ = [Qn,X„] = [(g^, . . . , (max(0, gj, - 
<)>••• Let A„ = [ai,...,a^] denote 

the arrival vector in slot n. The state of the system at the 
beginning of slot n + 1 can be represented as, S„+i = 
[Q„+i,X„+i] = [{ql + ai+i,...,(max(0,gj, - < + 
■■■,qn +an+i,xi+i, ■ ■ ■,x'^+i)\- The queue transition 
equation in the vector form can be written as, 

Q„+i =max(0, Q„ + A„+i -U„). (10) 

Let A = , . . . , (5^] denote the delay constraint vector. The 
problem in (9) can be converted into an unconstrained problem 
using the Lagrangian approach. The unconstrained problem 
can be expressed as, 

N 

Minimize P + ^A*(Q' - ^*). (11) 

i=l 

The inomediate cost •, •) incurred in scheduUng a user i in 
state S„ when the LM vector is A and rate vector U„ (user i 



is scheduled in slot n) can be expressed as, 

TV 

&(A, S„, U„) = P«, <) + >^\Qn - SI (12) 

i=l 

Let s° = [q°,x°] denote a fixed state. Let V{-) denote the 
dynamic programming value function based on the state S 
reached immediately after taking the scheduling decision but 
before the arrivals. Let JF„ = [JT^ , . . . , JT^] be the set of 
feasible' rate vectors in slot n. Let {/„} and {e„} be two 
sequences that have the following properties, 

/„ ^0, e„ ^ 0, ^(/„)2 < (X), ^(e„)2 < (X), (13) 

n n 

'^fn = oo, ^ e„ = 00, (14) 

n n 

V(/„' + ej) < oo, lim ^ ^ 0. (15) 

The significance of these properties is explained later. We now 
present an optimal online primal-dual algorithm for solving the 
constrained problem in (9): 

U„+i = arg^in |(l-/„)F(S„)-h/„ X 
{6(A,(Q„ + A„+i,X„+i),V) 

+K((Qn + A„+i-V,X„+i)) 

-K(s°)}}, (16) 
K+i(S„) = (i-/„)K(S„) + /„ X 

{b{X, (Qn + A„_|_i,X„+i),U„+i) 
+Vn{Qn + A„+i — U„+i, X„+i) 

-K(s°)}. (17) 

The algorithm in (16) and (17) is an onUne version of the 
well known Relative Value Iteration Algorithm (RVIA) [33]. 
It iteratively determines the optimal value function and hence 
the optimal poUcy one state at a time for a fixed value of 
the LM vector A. To determine the optimal LM vector we 
augment the above algorithm with a dual LM iteration: 

A„+i = A[A„ + e„(Q„- A)], (18) 

where A is a projection operator for ensuring that the LMs 
are non-negative and finite. The properties of the update 
sequences in (14) ensure that the sequences {/„} and {cn} 
converge to sufficiently fast to eliminate the noise effects 
when the iterates are close to their optimal values V{-,-) and 
A*, while those in (13) ensure that they do not approach 
too rapidly to avoid convergence of the algorithm to non- 
optimal values. Furthermore, (15) ensures that the update rates 
of primal iterations, i.e., the value function iterations and the 
dual iterations, i.e., the LM iterations are different. Since e„ 
approaches much faster than /„, the update rate of the value 
function iterations is much higher than the update rate of the 
LM iterations. This ensures that even though both the primals 
and duals are updated simultaneously, both converge to their 

rate is feasible based on the peak power constraints and the buffer 
occupancy 
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optimal values [34]. (16), (17) and (18) constitute the optimal 
algorithm. The proof of optimahty is exactly similar to that in 
L28J. 

However, compared to the single user case, the state space 
here is too large for the algorithm to converge in reasonable 
number of iterations. We therefore motivate an alternate ap- 
proach. We incorporate the possibility of transmitter induced 
errors in the single user scenario. We then motivate the multi- 
user solution by making use of this extension to the single 
user scenario. 



IV. Single User Scenario in Presence of 
Transmitter Errors 

We describe the scenario in brief. Consider a point to point 
transmission system over a fading wireless channel. Time is 
divided into slots of unit duration. We consider the block 
fading model as described in Section II. The scheduler is 
unaware of the probability distribution of the arrivals and the 
channel state in each slot. The objective is to minimize the 
average transmission power subject to average packet delay 
constraints. In [28], we have determined an online algorithm 
that determines the optimal transmission rate in each slot so as 
to minimize the average power expenditure subject to average 
packet delay constraints. We consider the following extension 
to the above problem: suppose that after the online algorithm 
has determined the rate Un € with a certain unknown 
random probability 0„ G [0, 1], the transmitter is unable to 
proceed with the transmission. We assume that the probability 
distribution of 6'„ is not known. Under this assumption, the 
queue evolution equation can be expressed as. 



Qn+l — Qn + InUn, 



(19) 



where /„ is an indicator variable that is set to 1 if the 
transmitter is successful in transmitting the packets and is set 
to otherwise. We now formulate the rate allocation problem 
for this scenario. The long term power expenditure can be 
expressed as. 



1 ^ 

Pe = limsup— E VP(X„,/„f/„) 



(20) 



The average queue length over a long period of time can be 
expressed as. 



1 ^ 

Qe = limsup— EV'Qn, 



Hence the rate allocation problem can be stated as. 
Minimize Pg subject to Qg < S. 



(21) 



(22) 



Note that the problem in (22) has the structure of a CMDP 
with a state space for the single user case and average cost 
criterion. The objective is to determine an optimal policy /x* 
such that the power expended under this poUcy is minimum 
possible while satisfying the delay constraint. 



A. The Primal Dual Approach 

The constrained problem in (22) can be converted into an 
unconstrained problem using the Lagrangian approach [32]. 
Let A > be a real number called as the Lagrange Multiplier 
(LM). Let B be the set {0,1}. c : n+ x Qx X x B xU ^ 11 
be defined as the following, 

C(A, Qn,X„, /„, Un) = P(Xn, InUn) + KQn ' ~5) , (23) 

where C/„ is determined using the rate allocation policy ^, : 
Q X X ^ U. The unconstrained problem is to minimize, 

1 ^ 

L(/x, A) = limsup — ^ c(A, (3„, X„, /„, /x((3„, X„)). 



M— >(x> 



n=l 



(24) 

L{-,-) is called the Lagrangian. Our objective is to determine 
the optimal rate allocation policy n* and optimal LM A* such 
that the following saddle point optimahty condition is satisfied. 



L(/x*,A)<L(/x*,A*)<L(/x,A*). 



(25) 



For a fixed LM A, the problem is an unconstrained Markov 
Decision Problem (MDP) with finite state and action spaces 
with the average cost criterion. The following dynamic pro- 
gramming equation [35] gives a necessary condition for opti- 
mahty of a solution. 



V{q, x) = min c(A, q, x,I,r) — /3 + 

r^T L 



^ X), r, {q + a' - r, x')) 



V{q + a' -r, x') 



,a' eA, x' G X, 



(26) 



where V^(-,-) is the value function, /3 G TZ is the unique 
optimal power expenditure. Let {q^ , x^) & Q x X be a fixed 
state. If we impose V{q°,x'^) = 0, then V{-, •) is unique [35]. 
p{s, r, s') is the probability of reaching a state s' upon taking 
an action r in state s. The traditional approaches for computing 
the optimal policy for an unconstrained average cost MDP 
such as the Relative Value Iteration Algorithm [35] require 
the knowledge of p(-,-,-) which in this case is dependent 
on the probabiUty distributions of the arrivals and channel 
states which is not known. Note that determining the optimal 
value function as defined in (26) is not sufficient because 
the unconstrained solution for a particular A does not ensure 
that the constraints would be satisfied. To ensure constraint 
satisfaction, the optimal LM needs to be determined. 



B. The Online Rate Allocation Algorithm 

We now present the rate allocation algorithm. Let the user 
state at the beginning of slot n be {Qn, = {q, x). Suppose 
that u bits are transmitted in slot n. The following primal-dual 
algorithm can be used to compute the rate C/„+i = fn+i at 
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which the transmitter should transmit in slot n+1, 

r„+i = arg min | (1 - fn)V„{q,x) + fn x 
{c(A„ , q + An+l , Xn+l ,1,1)) 

+Vn{q + An+l - V, Xn+l) 

-Vn{q°,x°)}}, (27) 

Vn+l{q,x) = {1- fn)Vn{q,x) + fnX 

|c(A„,g + An+l, Xn+l, In+ljfn+l) 
+Vn{q + An+l - In+irn+l,Xn+l) - 

V;(g°,i°)}, (28) 
= A[Xn + en{Qn-S)]. (29) 

These equations are explained below: 

1) (27), (28) and (29) constitute the rate allocation al- 
gorithm. It consists of two phases: rate determination 
phase and update phase. (27) constitutes the rate de- 
termination phase of the algorithm, i.e., it is used to 
determine the rate at which a user transmits in a slot if 
the transmission is successful. (28) is the primal iteration 
to determine the optimal value function and thereby the 
optimal policy, while (29) is the coupled dual iteration 
for determining the optimal LM. They constitute the 
update phase of the algorithm. 

2) If in a state {Qn,Xn) = {q,x), the transmitter decides 
to transmit u<q bits, then q = q — u, and x = x. 

3) (28) determines the optimal value function based on this 
new virtual state {q,x). Note that the value function for 
this new state is related to the usual value function as 
K(g,.i) =E'4^^[K(Q,^)]. 

4) The rate determination phase (27) determines the rate 
assuming that the transmitter would be successful in 
transmitting in slot n+1 (/„+i is assumed to be equal 
to 1). However in (28), updating the value function 
requires the knowledge of whether the transmission 
is successful or not. This because the immediate cost 
function c(-, •) depends on /„+i, i.e., whether the 
transmission is successful or not. Thus the update phase 
updates the value function and LM in each slot based 
on the success of the transmission. 

5) {q^,x^) is any pre-designated state. On the RHS in 
(28), the value function corresponding to this state is 
subtracted in order to keep the iterates bounded. 

6) The LM iteration in (29) ensures that the specified delay 
constraint is satisfied. 

7) The sequences /„ and e„ have properties specified in 
(13), (14) and (15). The reasons for imposing these 
properties have been explained in Section III-C. 



approximation based online algorithm, averages out this extra 
noise term and determines the optimal policy and the optimal 
LM. ■ 

V. An Online Primal Dual Algorithm For the 
Multi-user Problem 

In this section, we propose a suboptimal approach to solve 
the problem in (7). To avoid the state space explosion, in the 
proposed algorithm, we determine the rate e T'^ for a user 
i in a slot n, if it is scheduled, based on its state S*^ = [Q^, X^] 
alone instead of the entire system state S„ = [Q„,X„]. Note 
that S*^ e Q X A". The rate i?*, is determined using a rate 
allocation policy p*, i.e., a mapping from the history of states 
and rate allocations for user i to its transmission rate. Once 
the rate i?^ for each user i is determined, the next task is to 
determine a user to be scheduled in that slot. The user selection 
policy K is a mapping, k : .F^ x . . . x J^!^ X. 

A. Rate Allocation Algorithm for a User 

The rate allocation algorithm for each user behaves as if 
it were controlled by a single user policy in the presence 
of transmitter errors as explained in Section IV. Each user 
i determines the rate -R^+i at which it would transmit in slot 
n + 1 if it were to be scheduled in slot n + 1 and informs 
this rate to the base station. The base station uses the user 
selection algorithm to schedule a user. The users who are 
not scheduled in a slot update their value function assuming 
transmitter errors, while the user who is scheduled updates its 
value function assuming successful transmission. 

Remark 2: In the case of the single user scenario with 
transmitter errors, the probability with which a transmission 
is unsuccessful is independent of the scheduler action, i.e., 
the transmission rate determined by the online algorithm. In 
the multi-user scenario, this independence does not hold. This 
makes the problem a multi-agent learning problem [36], [37] 
where each agent (user) attempts to learn the optimal strategy 
and the actions taken by an agent (a user) influences the actions 
taken by the other agents (users). 

B. User Selection Algorithm 

The user selection algorithm is simple: select the user with 
the largest i?^, i.e., select the user with the best rate. The 
intuition behind this is the following. The rate allocation 
algorithm of a user i would direct it to transmit at a high rate 
i?^ under two circumstances: either the channel condition for 
that user is very good, in which case, transmission at high rate 
saves power, or the delay constraints of that user are not being 
satisfied. Thus selecting a user with a high rate results in either 
power savings or the user delay constraint being satisfied. 



C. Proof of Convergence 

Theorem 1: For the rate determination algorithm (27), (28) 
and (29), the iterates (K,A„) {V.X*). 

Proof: The proof of convergence is exactly similar to 
that in [28]. The probability of transmission failure in each slot 
serves as an extra noise term. The algorithm being a stochastic 



C. Algorithm Details and Implementation 

The rate allocation algorithm is implemented on the user 
devices while the user selection algorithm is implemented at 
the base station, as illustrated in Figure 2. From (27) note that 
the rate determination phase requires X^, i.e., the knowledge 
of the channel state at the base station. The communication 
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User 
Selection 




Fig. 2. Solution schematic 



overhead incurred by the base station in informing a user 
the channel state perceived by it depends on the number of 
states used to represent the channel. We represent the channel 
using 8 states. Hence the base station needs 3 bits per slot in 
order to inform a user the channel state perceived by it. The 
users inform the base station the rate at which they would 
transmit if they were to be scheduled. We allocate 3 bits 
for conveying this information, i.e., the system can employ 
8 rates. The user selection algorithm then determines the user 
to be scheduled and all the users are informed about this 
decision. The rate allocation algorithm at each user then enters 
the update phase where the value function and the LM for 
each user are appropriately updated using (28) and (29). The 
algorithm thus continues in each slot n. The rate allocation 
algorithm that is executed at each user device is illustrated in 
Algorithm 1 where, steps 4-8 represent the rate determination 
phase, while steps 10-14 represent the update phase. The user 
selection algorithm executed at the base station is detailed in 
Algorithm 2. 

D. Discussion 

Here we discuss certain aspects of the online algorithm: 

1) Computational complexity: The computational complex- 
ity of the rate allocation algorithm executed at a user 
device is independent of the number of users in the 
system. This is because the rate allocation algorithm 
for any user i is directly dependent on the user i state 
S** only and is independent of the states of the other 
users. The user selection algorithm has to determine 
the meiximum of A'' numbers and hence is linear in N. 
Thus the computational complexity of the user selection 
algorithm grows only linearly with the number of users. 

2) An auctioning interpretation: The solution can be inter- 
preted as an auction, where the user selection algorithm 



9 
10 
11 

12: 
13: 



14: 

15 
16 

17: 
18: 
19: 
20: 
21: 
22: 
23: 
24: 



Initialize the value function matrix V^{q,x) <— Vg' € 
Q,xeX 

Initialize LM A' <- 
Initialize slot counter n <— 1 
Initialize queue length <— 
InitiaUze channel states a;* <— 0, a;' <— 
Reference state s*'° = (0,a;^), where £ X 
whUe TRUE do 

while Base station has not informed the channel state 

x^ do 
wait 

end while 

Determine the number of arrivals = a* in the 

current slot 

Determine the queue length in the current slot = 
Use the rate determination phase of the rate allocation 
algorithm, i.e., (27) to determine the rate r*, for trans- 
mission 

Determine the power P(x' ,r') required to transmit r' 
bits using (3) 

Inform the base station of the rate 

while Base station has not scheduled a user do 

wait 
end while 

if user i is scheduled then 



w 
else 

end if 

Update the component of the value function 

matrix using (28). Rest of the components of the 
matrix remain unchanged 
Update the LM A' using (29) (Q^ = q') 



q" ^ q ^ 
X* <— X^' 
n ^— n+1 
end while 



a' 



Algorithm 1: The Rate Allocation Algorithm at the User i 
Device 



auctions each time slot. The users bid in the form of 
their transmission rates to the user selection algorithm, 
which allocates the time slot to the user bidding the 
highest rate. The rate bid by a user is dependent on 
its channel state and queue length constraint violation 
(i.e., the difference between the current queue length 
and the queue length constraint). If the channel state 
is quite good and queue constraint violation is large, 
the user bids a high rate. This is because transmitting 
at a high rate when the channel state is good saves 
power, while doing it when the queue length constraint 
violation is large aids in satisfying the delays. Note that 
the users do not bid unnecessarily high rates because 
that might result in higher power consumption. For a 
user, not winning an auction in a certain slot, implies 
that other users either have better channel conditions or 
higher queue length constraint violation or both. If a 
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1: while TRUE do 

2: for i € 1, . . . , do 

3: Estimate the channel state X^+i = a;' in the current 

slot for user i 
4: Inform to user i 
5: end for 

6: while Rate of each user is not known do 
7: wait 
8: end whUe 

9: Determine the user k who has the highest rate 
10: Schedule user k in the current slot 
11: end while 



algorithm. We determine the average delays experienced by the 
users under our algorithm and also the power expended by the 
users under our algorithm for the same maximum transmission 
power in each slot as in the M-LWDF scheme. We compare 
the average power expended by the users under under our 
algorithm, with that expended under the M-LWDF scheme. 
We perform the simulations within the framework of an IEEE 
802.16 system. Next, we provide some details regarding the 
IEEE 802.16 system. 

B. The IEEE 802.16 System 

The IEEE 802.16 standard specifies two modes for sharing 
the wireless medium: point-to-multipoint (PMP) and mesh. 
In this paper, we concentrate on the PMP mode where a 
centrahzed base station (BS) serves multiple subscriber sta- 
tions (SSs). We consider the uplink (UL) transmissions. IEEE 
802.16 medium access control (MAC) specfies four different 
scheduhng services in order to meet the QoS requirements 
of various applications. These are: unsoUcited grant service 
(UGS) (for real-time applications with strict delay require- 
ments), real-time polling service (rtPS) (for real-time appli- 
cations with less stringent delay requirements), non real-time 
polling service (nrtPS) and best effort (BE) (for applications 
that do not have any delay requirements). However, unlike BE 
connection, nrtPS connection is reserved a minimum amount 
of bandwidth. We consider the residential scenario as in [38]. 
It consists of a BS providing Internet access to the subscribers. 
Although the standard does not specify any QoS class for 
providing average delays, we argue that the nrtPS must be 
extended to cater to the average delay requirements of the 
users. The unicast polling service of nrtPS can be extended to 
inform a user the channel state perceived by the base station 
as well as to determine the rate at which a user would transmit 
if it were to be scheduled. The scheduhng algorithm can thus 
be implemented as a part of nrtPS. 

The system can be operated in either time division duplex 
(TDD) or frequency division duplex (FDD) mode. We assume 
the FDD mode of operation where all SSs have full-duplex 
capability. We consider a single carrier system (WirelessMAN- 
SC) with a frame duration of 1 msec and bandwidth 1^ of 10 
MHz. We assume that the users transmit at a rate such that data 
is delivered reliably to the base station. Hence we do not con- 
sider retransmissions and Automatic Repeat Request (ARQ). 
The SSs employ the following modulations: 64 Quadrature 
Amphtude Modulation (QAM), 16 QAM, Quadrature Phase 
Shift Keying (QPSK), QPSK with 1/2 rate convolutional code 
which provide us with 4 rates of transmission. 

C. Simulation Setup and Results 

Internet traffic is modeled as a web traffic source [38], [39]. 
Packet sizes are drawn from a truncated Pareto distribution 
(shape factor 1.2, mode = 2000 bits, cutoff threshold = 10000 
bits) which provides us with an average packet size of 3860 
bits. In each time frame, we generate the arrivals for all the 
users using Poisson distribution^. Arrivals are generated in 

^AA does not rely on the Poisson arrival process of the users, we simulate 
using the Poisson process only for the purpose of illustration. 



Algorithm 2: The User Selection Algorithm at the Base 
Station 

user does not win the auction for a certain number of 
slots successively, its queue length grows thus forcing it 
to bid a higher rate. Motivated by this interpretation, we 
refer to the scheduling scheme proposed in this paper as 
the Auctioning Algorithm (AA). 

VI. Simulation Setup and Results 

We demonstrate the performance of our algorithm under 
the IEEE 802.16 [1] framework through our simulations in a 
discrete event simulator. Specifically, we intend to demonstrate 
the following: 

1) The algorithm satisfies the delay constraints of all the 
users. 

2) The algorithm is efficient in terms of the power con- 
sumed for each of the users. Moreover, power consumed 
is commensurate with the delay requirements, average 
arrival rates and the channel states of the users. 

3) The sum power consumed under our algorithm is 
marginally more than that consumed under the optimal 
algorithm in Section III-C. 

4) Average power expended by the users under our algo- 
rithm, is much less than that expended under the M- 
LWDF scheme [13]. 

A. M-LWDF Algorithm Details 

The M-LWDF scheduler [13] attempts to minimize the user 
delay. It also considers the probability with which a user's 
queue length is allowed to exceed a certain target queue length. 
We assume that this probabiUty is the same for all the users 
and hence adapt the M-LWDF scheme for our scenario by 
ignoring it in the present simulations. Specifically, the adapted 
M-LWDF scheme schedules a user i in each slot such that, 

i = argmaxT^ x U^, (30) 

j 

where is the delay experienced by the head of the line 
packet for user j. M-LWDF scheme transmits at a constant 
maximum power in each time slot. We first determine the 
average delays experienced by the users under the M-LWDF 
scheme for various transmission powers. We consider the 
values of these delays to be the delay constraints for our 



TABLE I 

Comparison between Optimal Algorithm (OA) and the 

AliC riOMNC, ALCIORirilM (AA) 



Delay 
Constraint 


Achieved 
Delay for OA 


Acliieved 
Delay for AA 


Power for 
OA 


Power for 
AA 


3 msec 


3.09119 


3.09700 


0.26359 


0.26522 


5 msec 


3.57960 


3.40470 


0.24756 


0.26097 



an i.i.d. manner across frames. We fragment the packets into 
fragments of size 2000 bits each. Fragments of size less than 
2000 bits are padded with extra bits to make them of size 2000 
bits. Since all fragments are of equal size, we determine the 
transmission rate for users in terms of number of fragments 
per frame instead of bits per frame. We simulate a Rayleigh 
fading channel^ for each user. For a Rayleigh model, channel 
state X* is an exponentially distributed random variable with 

1 — ^. 

probability density function given by fx^-yx) = -r^ , 
where a* is the mean of X\ We know from (3) that the power 
required for transmitting u fragments of size £ bits when the 
channel state is x is given by, P{x,u) = (^2"^^/w _ 
where Nq is the power spectral density of the additive white 
Gaussian noise and W is the received signal bandwidth. 
We assume that the product WNq is normalized to 1. We 
measure the sum of queuing and transmission delays of the 
packets and ignore the propagation delays. In all the scenarios 
described below, a single simulation run consists of running 
the algorithm for 100000 frames and the results are obtained 
after averaging over 20 simulation runs. 

Scenario 1: Comparison with the Optimal Algorithm: This 
scenario demonstrates that the sum power expended by AA is 
very close to that expended by the optimal algorithm (OA) 
suggested in Section III-C. For this scenario, we assume 
N = 2, i.e., two users. The channel state can be either 
bad (a^ = 0.1422 (-8.47 dB)) or good (a^ = 2.0796 (3.18 
dB)). We assume a buffer of size 10 packets (B = 10) 
at each user. In each frame the arrivals are generated using 
the Poisson distribution with mean 0.05 packets/msec (0.184 
Mbits/sec/user). Packet lengths are Pare to distributed with 
parameters as discussed previously in this section. In each 
frame, we generate a Rayleigh random variable with mean 
0.9817 (-0.08 dB). If the value taken by the random variable 
is greater than 2.0796, the channel state is assumed to be good, 
else channel state is assumed to be bad. The peak transmission 
power in any slot is fixed at 3 Watts. We compare the sum 
power for the two users for the two schemes in Table 1. It can 
be seen that both the schemes satisfy the delay constraints. The 
power required by AA is marginally more than that required 
for the OA. 

For the rest of the scenarios, we discretize the channel into 
eight equal probability bins, with the boundaries specified by 
{ (-CX), -8.47 dB), [-8.47 dB, -5.41 dB), [-5.41 dB, -3.28 
dB), [-3.28 dB, -1.59 dB), [-1.5 dB, -0.08 dB), [-0.08 
dB, 1.42 dB), [1.42 dB, 3.18 dB), [3.18 dB, oo ) }. For each 
bin, we associate a channel state and the state space X = { 
-13 dB, -8.47 dB, -5.41 dB, -3.28 dB, -1.59 dB, -0.08 

'AA does not rely on the Rayleigh channel, we simulate using a Rayleigh 
chaimel only for the purpose of illustration. 
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TABLE II 

Comparison between M-LWDF and the AA 



Power 


Achieved Delay 


Achieved 


Average Power 


Average 


Constr. 


- M-LWDF 


Delay - AA 


- M-LWDF 


Power - AA 


1.5 


28.71532 


28.12900 


0.07499 


0.04206 


2 


28.18142 


28.01677 


0.09999 


0.04737 


2.5 


22.57460 


22.03332 


0.12499 


0.05530 


3 


22.12825 


18.27730 


0.14999 


0.07007 


3.5 


21.99025 


16.36487 


0.17499 


0.07026 


4 


20.09445 


16.39282 


0.19999 


0.07073 


4.5 


20.09445 


14.74980 


0.22497 


0.07074 



dB, lA2dB, 3.18 dB}. This discretization of the state space 
of X* has been justified in [19]. We assume = 20, i.e., 
a system with 20 users and thereby 20 UL connections. We 
assume that the number of users do not change during the 
course of the simulation. Users are divided into two groups 
(Group 1 and Group 2) of 10 users each. 

Scenario 2: Comparison with the M-LWDF Algorithm: We 
compare the power consumed by the M-LWDF algorithm with 
the power consumed by the AA. We first simulate the M- 
LWDF scheme [13]. In each frame, arrivals are generated with 
a Poisson distribution with mean 0.1 packets/msec. Packet 
lengths are Pareto distributed as explained above. This results 
in an arrival rate of 0.386 Mbits/sec/user. We choose a' = 
0.9817(— 0.08 clB) \fi. In each slot we generate using ex- 
ponential distribution with mean a'. We determine the channel 
state based on the bin that contains as explained above. 
We perform multiple experiments. In successive experiments, 
we fix the maximum transmission power at 1.5, 2, 2.5, 3, 4, 4.5 
Watts respectively. For each experiment, we determine the 
average delays experienced by the users. Next we fix the 
achieved delays by the M-LWDF algorithm as the delay 
constraints for our algorithm. The maximum transmission 
power is the same as that for the M-LWDF algorithm. We 
determine the average power expended for each user under 
the AA and compare it to that for the M-LWDF algorithm 
in Table 11 (Power constraint and average power expenditure 
expressed in Watts, achieved delays expressed in milh-seconds 
(msec)). It can be seen that for all the experiments, the AA 
satisfies the delay constraints. Moreover, the power expended 
by each user under the AA is much less than that expended 
under the M-LWDF algorithm. This is because the AA is able 
to adapt the transmission power based on the channel state 
and delay constraint violation. 

Scenario 3: In this scenario, we demonstrate that the AA 
satisfies the various user specified delay constraints. We con- 
sider two cases: symmetric and asymmetric. In each frame 
arrivals are generated with a Poisson distribution with mean 
0.1 packets/msec. Packet lengths are Pareto distributed with 
parameters detailed above. This results in an arrival rate of 
0.386 Mbits/sec/user. We choose a' = 0.9817(-0.08 dB) Vi. 
In each slot we generate X' using exponential distribution 
with mean a*. We determine the channel state based on 
the bin that contains X^ as explained above. We perform 
multiple experiments. In the symmetric scenario, in successive 
experiments, the delay constraints of all the users are fixed at 
25, 50, 75, 100, 125, 150, 175 msec respectively. We measure 
the average delay experienced and the average power expended 
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by each user in each experiment. These quantities for a user 
selected at random are plotted in Figure 3. In the asymmetric 
case, the delay constraint of the users in Group 1 are fixed at 
100 msec in each experiment, while the delay constraints of 
the users in Group 2 are fixed at 25, 50, 75, 100, 125, 150, 175 
msec in successive experiments. Average delay suffered by 
a user selected at random from Group 1 and Group 2 and 
power consumed by them are plotted in Figure 4. It can be 
observed from Figures 3(a) and 4(a) that the delay constraints 
are satisfied in both the cases. Moreover, from Figures 3(b) 
and 4(b) it can be observed that power expended is a convex 
decreasing function of the delay constraint imposed by the 
user. Larger delay constraints imply that much lesser power is 
required to satisfy the constraint. 

Scenario 4: In this scenario, we demonstrate that the AA 
satisfies the user specified delay constraints for various channel 
conditions. We consider two cases: symmetric and asymmetric. 
The delay constraints of all the users are kept constant at 
100 msec. For the symmetric case, we fix a* as —13 dB, 
-8.47 dB, -5.41 dB, -3.28 dB, -1.59 dB, -0.08 dB, 
1.42 dB, Vi in successive experiments. Rest of the parameters 
are the same as in Scenario 3. We measure the average delay 
suffered by each of the users and the average power consumed 
by each of them. These quantities are plotted in Figure 5. In 
the asymmetric case, we maintain the average channel state 
for users in Group 1 constant for all the experiments, i.e., 
a' = —0.08 dB, i G 1, . . . , 10. For the users in Group 
2, i.e., a* for i € 11,..., 20, the average channel state is 
fixed at a* = -13 dB, -8.47 dB, -5.41 dB, -3.28 dB, 
— 1.59 dB, —0.08 dB, 1.42 dB, in successive experiments. 
Average delay suffered by a user in Group 1 and in Group 2 
and power consumed by them are plotted in Figure 6. It can be 
observed from Figures 5(a) and 6(a) that the delay constraints 
are satisfied even for extremely poor channel conditions. 
Moreover, from Figures 5(b) and 6(b) it can be observed 
that the scheme is able to satisfy the delay constraints above 
a certain average channel state^. Better channel conditions 
imply that much lesser power is required to satisfy the delay 
constraints. 

Scenario 5: In this scenario we demonstrate the range of 
arrival rates for which the AA satisfies the user specified delay 
constraint of 100 msec. We consider two cases - symmetric 
and asymmetric. In the symmetric case, the arrival rates of all 
the users are fixed at 0.2702to0.5018 Mbits/sec (0.05to0.12 
packets/msec) in successive experiments. Rest of the parame- 
ters are same as in Scenario 3. We measure the average delay 
suffered and average power expended by each user. These 
quantities for a user chosen at random are plotted in Figure 7. 
In the asymmetric case, the arrival rate of the users in Group 
1 is fixed at 0.386 Mbits/sec (0.15 packets/msec) for all the 
experiments, while the arrival rates of the users in Group 2 
are increased from 0.1351 - 0.2509 Mbits/sec (0.07 - 0.13 
packets/msec) in 8 steps in successive experiments. Rest of the 
parameters are same as in Scenario 3. Average delay suffered 
by a user from Group 1 and Group 2 (each selected at random) 
and power consumed by them are plotted in Figure 8. It can be 

''This average channel state is dependent on the peak transmission power. 



observed from Figures 7(a) and 8(a) that the delay constraints 
are satisfied in both the cases. From Figures 7(b) and 8(b) it 
can be seen that power expended is an increasing function of 
the average arrival rates for the same delay constraint. Higher 
the arrival rate, higher is the power expended. 

VII. Conclusion 

In this paper, we have considered the problem of power 
efficient uplink scheduhng in a TDMA system over a fading 
wireless channel with the objective of minimizing user power 
expenditure under individual delay constraints. We have as- 
sumed that the user statistics are unknown, i.e., the proba- 
bility distributions of the user arrivals and channel states are 
unknown. We have formulated the problem under the CMDP 
framework. Determining the optimal policy under the CMDP 
framework faces two problems: state space explosion and 
unknown user statistics. To tackle state space explosion, we 
have suggested performing the rate allocation for a particular 
user based on its buffer occupancy and channel state only. The 
rate allocation algorithm is a learning algorithm that learns 
about the channel state and buffer occupancy of a user during 
system execution and determines its rate of transmission and 
hence takes care of the unknown user statistics. Once the rate 
allocation for all the users is is done, the algorithm schedules 
a user with the highest rate in a slot. We have performed 
simulations in the IEEE 802.16 system. Our simulation results 
have demonstrated that the system is indeed able to satisfy the 
user specified delay constraints and comparison with the M- 
LWDF scheme indicates that power expended by the users is 
low. Moreover, the power expended is commensurate with the 
QoS requirement, lower average arrival rates, better average 
channel conditions and higher average delay requirements lead 
to lower power expenditure. 
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Fig. 3. Variation of achieved delay and power consumed for various delay constraints - symmetric case 
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Fig. 4. Variation of achieved delay and power consumed for various delay constraints - asymmetric case 
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Fig. 5. Variation of achieved delay and power consumed for various channel conditions - symmetric case 
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Fig. 6. Variation of achieved delay and power consumed for various channel conditions - asymmetric case 
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Fig. 7. Variation of achieved delay and power consumed for various arrival rates - symmetric case 
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Fig. 8. Variation of achieved delay and power consumed for various arrival rates - asymmetric case 



